128 research outputs found

    Bayesian Multimodel Inference for Geostatistical Regression Models

    Get PDF
    The problem of simultaneous covariate selection and parameter inference for spatial regression models is considered. Previous research has shown that failure to take spatial correlation into account can influence the outcome of standard model selection methods. A Markov chain Monte Carlo (MCMC) method is investigated for the calculation of parameter estimates and posterior model probabilities for spatial regression models. The method can accommodate normal and non-normal response data and a large number of covariates. Thus the method is very flexible and can be used to fit spatial linear models, spatial linear mixed models, and spatial generalized linear mixed models (GLMMs). The Bayesian MCMC method also allows a priori unequal weighting of covariates, which is not possible with many model selection methods such as Akaike's information criterion (AIC). The proposed method is demonstrated on two data sets. The first is the whiptail lizard data set which has been previously analyzed by other researchers investigating model selection methods. Our results confirmed the previous analysis suggesting that sandy soil and ant abundance were strongly associated with lizard abundance. The second data set concerned pollution tolerant fish abundance in relation to several environmental factors. Results indicate that abundance is positively related to Strahler stream order and a habitat quality index. Abundance is negatively related to percent watershed disturbance

    Bootstrap-after-Bootstrap Model Averaging for Reducing Model Uncertainty in Model Selection for Air Pollution Mortality Studies

    Get PDF
    Ba c k g r o u n d: Concerns have been raised about findings of associations between particulate matter (PM) air pollution and mortality that have been based on a single ā€œbest ā€ model arising from a model selection procedure, because such a strategy may ignore model uncertainty inherently involved in searching through a set of candidate models to find the best model. Model averaging has been proposed as a method of allowing for model uncertainty in this context. Objectives: To propose an extension (double BOOT) to a previously described bootstrap modelaveraging procedure (BOOT) for use in time series studies of the association between PM and mortality. We compared double BOOT and BOOT with Bayesian model averaging (BMA) and a standard method of model selection [standard Akaikeā€™s information criterion (AIC)]. Me t h o d: Actual time series data from the United States are used to conduct a simulation study to compare and contrast the performance of double BOOT, BOOT, BMA, and standard AIC. Re s u l t s: Double BOOT produced estimates of the effect of PM on mortality that have had smaller root mean squared error than did those produced by BOOT, BMA, and standard AIC. This performance boost resulted from estimates produced by double BOOT having smaller variance than those produced by BOOTand BMA. Co n c l u s i o n s: Double BOOT is a viable alternative to BOOT and BMA for producing estimates of the mortality effect of PM. Key w o r d s: air pollution, Bayesian, bootstrap, model averaging, mortality, particulate matter. Environ Health Perspect 118:131ā€“136 (2010). doi:10.1289/ehp.0901007 available vi

    Fuzzy Fibers: Uncertainty in dMRI Tractography

    Full text link
    Fiber tracking based on diffusion weighted Magnetic Resonance Imaging (dMRI) allows for noninvasive reconstruction of fiber bundles in the human brain. In this chapter, we discuss sources of error and uncertainty in this technique, and review strategies that afford a more reliable interpretation of the results. This includes methods for computing and rendering probabilistic tractograms, which estimate precision in the face of measurement noise and artifacts. However, we also address aspects that have received less attention so far, such as model selection, partial voluming, and the impact of parameters, both in preprocessing and in fiber tracking itself. We conclude by giving impulses for future research

    Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines

    Get PDF
    Background: Multiple imputation (MI) provides an effective approach to handle missing covariate data within prognostic modelling studies, as it can properly account for the missing data uncertainty. The multiply imputed datasets are each analysed using standard prognostic modelling techniques to obtain the estimates of interest. The estimates from each imputed dataset are then combined into one overall estimate and variance, incorporating both the within and between imputation variability. Rubin's rules for combining these multiply imputed estimates are based on asymptotic theory. The resulting combined estimates may be more accurate if the posterior distribution of the population parameter of interest is better approximated by the normal distribution. However, the normality assumption may not be appropriate for all the parameters of interest when analysing prognostic modelling studies, such as predicted survival probabilities and model performance measures. Methods: Guidelines for combining the estimates of interest when analysing prognostic modelling studies are provided. A literature review is performed to identify current practice for combining such estimates in prognostic modelling studies. Results: Methods for combining all reported estimates after MI were not well reported in the current literature. Rubin's rules without applying any transformations were the standard approach used, when any method was stated. Conclusion: The proposed simple guidelines for combining estimates after MI may lead to a wider and more appropriate use of MI in future prognostic modelling studies

    A heteroskedastic error covariance matrix estimator using a first-order conditional autoregressive Markov simulation for deriving asympotical efficient estimates from ecological sampled Anopheles arabiensis aquatic habitat covariates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Autoregressive regression coefficients for <it>Anopheles arabiensis </it>aquatic habitat models are usually assessed using global error techniques and are reported as error covariance matrices. A global statistic, however, will summarize error estimates from multiple habitat locations. This makes it difficult to identify where there are clusters of <it>An. arabiensis </it>aquatic habitats of acceptable prediction. It is therefore useful to conduct some form of spatial error analysis to detect clusters of <it>An. arabiensis </it>aquatic habitats based on uncertainty residuals from individual sampled habitats. In this research, a method of error estimation for spatial simulation models was demonstrated using autocorrelation indices and eigenfunction spatial filters to distinguish among the effects of parameter uncertainty on a stochastic simulation of ecological sampled <it>Anopheles </it>aquatic habitat covariates. A test for diagnostic checking error residuals in an <it>An. arabiensis </it>aquatic habitat model may enable intervention efforts targeting productive habitats clusters, based on larval/pupal productivity, by using the asymptotic distribution of parameter estimates from a residual autocovariance matrix. The models considered in this research extends a normal regression analysis previously considered in the literature.</p> <p>Methods</p> <p>Field and remote-sampled data were collected during July 2006 to December 2007 in Karima rice-village complex in Mwea, Kenya. SAS 9.1.4<sup>Ā® </sup>was used to explore univariate statistics, correlations, distributions, and to generate global autocorrelation statistics from the ecological sampled datasets. A local autocorrelation index was also generated using spatial covariance parameters (i.e., Moran's Indices) in a SAS/GIS<sup>Ā® </sup>database. The Moran's statistic was decomposed into orthogonal and uncorrelated synthetic map pattern components using a Poisson model with a gamma-distributed mean (i.e. negative binomial regression). The eigenfunction values from the spatial configuration matrices were then used to define expectations for prior distributions using a Markov chain Monte Carlo (MCMC) algorithm. A set of posterior means were defined in WinBUGS 1.4.3<sup>Ā®</sup>. After the model had converged, samples from the conditional distributions were used to summarize the posterior distribution of the parameters. Thereafter, a spatial residual trend analyses was used to evaluate variance uncertainty propagation in the model using an autocovariance error matrix.</p> <p>Results</p> <p>By specifying coefficient estimates in a Bayesian framework, the covariate number of tillers was found to be a significant predictor, positively associated with <it>An. arabiensis </it>aquatic habitats. The spatial filter models accounted for approximately 19% redundant locational information in the ecological sampled <it>An. arabiensis </it>aquatic habitat data. In the residual error estimation model there was significant positive autocorrelation (i.e., clustering of habitats in geographic space) based on log-transformed larval/pupal data and the sampled covariate depth of habitat.</p> <p>Conclusion</p> <p>An autocorrelation error covariance matrix and a spatial filter analyses can prioritize mosquito control strategies by providing a computationally attractive and feasible description of variance uncertainty estimates for correctly identifying clusters of prolific <it>An. arabiensis </it>aquatic habitats based on larval/pupal productivity.</p

    Bayesian probabilistic network modeling from multiple independent replicates

    Get PDF
    Often protein (or gene) time-course data are collected for multiple replicates. Each replicate generally has sparse data with the number of time points being less than the number of proteins. Usually each replicate is modeled separately. However, here all the information in each of the replicates is used to make a composite inference about signal networks. The composite inference comes from combining well structured Bayesian probabilistic modeling with a multi-faceted Markov Chain Monte Carlo algorithm. Based on simulations which investigate many different types of network interactions and experimental variabilities, the composite examination uncovers many important relationships within the networks. In particular, when the edge's partial correlation between two proteins is at least moderate, then the composite's posterior probability is large

    Integrating Factor Analysis and a Transgenic Mouse Model to Reveal a Peripheral Blood Predictor of Breast Tumors

    Get PDF
    Abstract Background Transgenic mouse tumor models have the advantage of facilitating controlled in vivo oncogenic perturbations in a common genetic background. This provides an idealized context for generating transcriptome-based diagnostic models while minimizing the inherent noisiness of high-throughput technologies. However, the question remains whether models developed in such a setting are suitable prototypes for useful human diagnostics. We show that latent factor modeling of the peripheral blood transcriptome in a mouse model of breast cancer provides the basis for using computational methods to link a mouse model to a prototype human diagnostic based on a common underlying biological response to the presence of a tumor. Methods We used gene expression data from mouse peripheral blood cell (PBC) samples to identify significantly differentially expressed genes using supervised classification and sparse ANOVA. We employed these transcriptome data as the starting point for developing a breast tumor predictor from human peripheral blood mononuclear cells (PBMCs) by using a factor modeling approach. Results The predictor distinguished breast cancer patients from healthy individuals in a cohort of patients independent from that used to build the factors and train the model with 89% sensitivity, 100% specificity and an area under the curve (AUC) of 0.97 using Youden's J-statistic to objectively select the model's classification threshold. Both permutation testing of the model and evaluating the model strategy by swapping the training and validation sets highlight its stability. Conclusions We describe a human breast tumor predictor based on the gene expression of mouse PBCs. This strategy overcomes many of the limitations of earlier studies by using the model system to reduce noise and identify transcripts associated with the presence of a breast tumor over other potentially confounding factors. Our results serve as a proof-of-concept for using an animal model to develop a blood-based diagnostic, and it establishes an experimental framework for identifying predictors of solid tumors, not only in the context of breast cancer, but also in other types of cancer.</p

    Doseā€“responses from multi-model inference for the non-cancer disease mortality of atomic bomb survivors

    Get PDF
    The non-cancer mortality data for cerebrovascular disease (CVD) and cardiovascular diseases from Report 13 on the atomic bomb survivors published by the Radiation Effects Research Foundation were analysed to investigate the doseā€“response for the influence of radiation on these detrimental health effects. Various parametric and categorical models (such as linear-no-threshold (LNT) and a number of threshold and step models) were analysed with a statistical selection protocol that rated the model description of the data. Instead of applying the usual approach of identifying one preferred model for each data set, a set of plausible models was applied, and a sub-set of non-nested models was identified that all fitted the data about equally well. Subsequently, this sub-set of non-nested models was used to perform multi-model inference (MMI), an innovative method of mathematically combining different models to allow risk estimates to be based on several plausible doseā€“response models rather than just relying on a single model of choice. This procedure thereby produces more reliable risk estimates based on a more comprehensive appraisal of model uncertainties. For CVD, MMI yielded a weak doseā€“response (with a risk estimate of about one-third of the LNT model) below a step at 0.6Ā Gy and a stronger doseā€“response at higher doses. The calculated risk estimates are consistent with zero risk below this threshold-dose. For mortalities related to cardiovascular diseases, an LNT-type doseā€“response was found with risk estimates consistent with zero risk below 2.2Ā Gy based on 90% confidence intervals. The MMI approach described here resolves a dilemma in practical radiation protection when one is forced to select between models with profoundly different doseā€“responses for risk estimates

    Uncertainty analysis using Bayesian Model Averaging: a case study of input variables to energy models and inference to associated uncertainties of energy scenarios

    Get PDF
    Background Energy models are used to illustrate, calculate and evaluate energy futures under given assumptions. The results of energy models are energy scenarios representing uncertain energy futures. Methods The discussed approach for uncertainty quantification and evaluation is based on Bayesian Model Averaging for input variables to quantitative energy models. If the premise is accepted that the energy model results cannot be less uncertain than the input to energy models, the proposed approach provides a lower bound of associated uncertainty. The evaluation of model-based energy scenario uncertainty in terms of input variable uncertainty departing from a probabilistic assessment is discussed. Results The result is an explicit uncertainty quantification for input variables of energy models based on well-established measure and probability theory. The quantification of uncertainty helps assessing the predictive potential of energy scenarios used and allows an evaluation of possible consequences as promoted by energy scenarios in a highly uncertain economic, environmental, political and social target system. Conclusions If societal decisions are vested in computed model results, it is meaningful to accompany these with an uncertainty assessment. Bayesian Model Averaging (BMA) for input variables of energy models could add to the currently limited tools for uncertainty assessment of model-based energy scenarios

    ART: A machine learning Automated Recommendation Tool for synthetic biology

    Get PDF
    Biology has changed radically in the last two decades, transitioning from a descriptive science into a design science. Synthetic biology allows us to bioengineer cells to synthesize novel valuable molecules such as renewable biofuels or anticancer drugs. However, traditional synthetic biology approaches involve ad-hoc engineering practices, which lead to long development times. Here, we present the Automated Recommendation Tool (ART), a tool that leverages machine learning and probabilistic modeling techniques to guide synthetic biology in a systematic fashion, without the need for a full mechanistic understanding of the biological system. Using sampling-based optimization, ART provides a set of recommended strains to be built in the next engineering cycle, alongside probabilistic predictions of their production levels. We demonstrate the capabilities of ART on simulated data sets, as well as experimental data from real metabolic engineering projects producing renewable biofuels, hoppy flavored beer without hops, and fatty acids. Finally, we discuss the limitations of this approach, and the practical consequences of the underlying assumptions failing
    • ā€¦
    corecore